Univariate Stock Predictions: LSTM, ARIMA, and prophet

INFO 523 - Final Project

Project description
Author
Affiliation

Matt Osterhoudt

College of Information Science, University of Arizona

Abstract

Time series models can be used to predict track stock data using historical closing values. Importing stock data from Yahoo Finance, we predict MSFT, SPY, and SWPPX by comparing three models: ARIMA (AutoRegressive Integrated Moving Average), LSTM (Long Short-Term Memory Neural Network), and Facebook Prophet. Each model has its strengths, but the LSTM model proved to be the most effective with a high R-squared value across the three stocks (average of .987). The LSTM model also boasted significantly lower Mean Squared Error and Mean Absolute Error when compared to ARIMA and Prophet. This performance conveys that LSTM may prove more effective for moderately long stock predictions and signifies the power of recurring neural networks.

Introduction/Question

Driving question: Which time series model (ARIMA, LSTM, or Prophet) is best for univariate (daily closing price) stock price prediction? This project aims to identify the most effective model using three different stock datasets. The stock data used will be MSFT (Microsoft), SPY (SPDR S&P 500 ETF Trust), and SWPPX (Schwab U.S. Large-Cap ETF). The data will range from the beginning of 2015 to the end of 2024. The only relevant variables used from the data will be the closing price and the date index. “Close” is the closing price of the stock per day. The date index will also be referred to in yyyy-mm-day format. By developing these models, our objective is to clarify which model can be utilized or recognized when it comes to predicting stock data.

Approach

First, I extracted the stock data using the finance package, an API that retrieves stock data from Yahoo Finance. I extracted daily historical data from 2015 to 2024 and retained the “Close” price series as well as the date index for univariate analysis. I did not deem it necessary to preprocess much of the data. This is because there were no outliers I wanted to remove, nor was there missing data outside of holidays and weekends (this is expected). Feature scaling and normalization were performed within certain time series models if necessary. I selected these time series models to deepen my understanding of time series analysis.

ARIMA Approach & Analysis

First, I chose ARIMA (AutoRegressive Integrated Moving Average) for its ability to capture autocorrelated data and patterns after differencing is applied. To help prepare the data, ARIMA requires stationarity tests, which are applied using the ADF (Augmented Dickey-Fuller) test to test raw vs differenced data. If the p-value is less than the significance level (0.05), we may reject the null, implying that the time series data is stationary. If higher than the significant value, we must find the order of differencing. ARIMA also requires ACF(autocorrelation) and PACF(partial autocorrelation) analysis for P and Q lag selections. ACF and PACF are plotted graphically to aid P and Q selection. Determining how ARIMA is configured with these parameters (p, d, q) helps properly fit the model. Each stock’s data was split into 80/20 for training and testing. Predicted values were graphically overlaid on the actual test data for visual inspection and confirmation. The ARIMA model did not perform as well as I expected. Despite the P, D, and Q selection tests, my model fell quite short. As seen with all three stocks, it plotted a very linear “prediction” that seems to have simply taken an average. This tells me that the model failed to have any meaningful predictions, or that seasonality played an unexpected role. Limitations or future implementations: address this by utilizing SARIMA or autoarima. I can also do a more thorough check of my model for anything extraneous.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2022-12-29 00:00:00-05:00 to 2024-12-31 00:00:00-05:00
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Close   504 non-null    float64
dtypes: float64(1)
memory usage: 7.9 KB
p-value pre-difference: 0.8514689712593815
p-value post-difference: 8.881046092983297e-28
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(9, 1, 9)   Log Likelihood               -5023.279
Date:                Thu, 21 Aug 2025   AIC                          10086.558
Time:                        05:39:58   BIC                          10198.686
Sample:                             0   HQIC                         10127.718
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0972      0.048      2.018      0.044       0.003       0.192
ar.L1          0.0388      0.072      0.537      0.592      -0.103       0.181
ar.L2         -0.1230      0.070     -1.754      0.079      -0.260       0.014
ar.L3          0.1223      0.067      1.813      0.070      -0.010       0.255
ar.L4         -0.1085      0.058     -1.876      0.061      -0.222       0.005
ar.L5         -0.0764      0.067     -1.145      0.252      -0.207       0.054
ar.L6         -0.0956      0.065     -1.465      0.143      -0.223       0.032
ar.L7          0.1163      0.064      1.813      0.070      -0.009       0.242
ar.L8          0.0843      0.058      1.457      0.145      -0.029       0.198
ar.L9          0.6355      0.064      9.874      0.000       0.509       0.762
ma.L1         -0.1355      0.076     -1.785      0.074      -0.284       0.013
ma.L2          0.1258      0.073      1.713      0.087      -0.018       0.270
ma.L3         -0.1657      0.069     -2.386      0.017      -0.302      -0.030
ma.L4          0.1221      0.063      1.929      0.054      -0.002       0.246
ma.L5          0.0785      0.071      1.112      0.266      -0.060       0.217
ma.L6          0.0180      0.069      0.262      0.793      -0.117       0.153
ma.L7         -0.0905      0.069     -1.316      0.188      -0.225       0.044
ma.L8         -0.1726      0.062     -2.801      0.005      -0.293      -0.052
ma.L9         -0.5112      0.071     -7.219      0.000      -0.650      -0.372
sigma2         8.6477      0.139     62.208      0.000       8.375       8.920
===================================================================================
Ljung-Box (L1) (Q):                   0.12   Jarque-Bera (JB):              4172.29
Prob(Q):                              0.73   Prob(JB):                         0.00
Heteroskedasticity (H):              44.74   Skew:                            -0.55
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.97
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

p-value pre-difference: 0.7797882508303855
p-value post-difference: 3.723409574924781e-26
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(9, 1, 6)   Log Likelihood               -5248.560
Date:                Thu, 21 Aug 2025   AIC                          10531.121
Time:                        05:40:08   BIC                          10626.429
Sample:                             0   HQIC                         10566.106
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0975      0.077      1.271      0.204      -0.053       0.248
ar.L1         -1.0019      0.193     -5.191      0.000      -1.380      -0.624
ar.L2         -0.1904      0.137     -1.386      0.166      -0.460       0.079
ar.L3          0.1376      0.099      1.385      0.166      -0.057       0.332
ar.L4         -0.5948      0.084     -7.048      0.000      -0.760      -0.429
ar.L5         -1.0291      0.169     -6.082      0.000      -1.361      -0.698
ar.L6         -0.6124      0.121     -5.064      0.000      -0.849      -0.375
ar.L7          0.0267      0.022      1.206      0.228      -0.017       0.070
ar.L8         -0.0182      0.027     -0.667      0.505      -0.072       0.035
ar.L9          0.0242      0.028      0.878      0.380      -0.030       0.078
ma.L1          0.9433      0.192      4.906      0.000       0.566       1.320
ma.L2          0.1488      0.129      1.150      0.250      -0.105       0.402
ma.L3         -0.1233      0.097     -1.266      0.205      -0.314       0.068
ma.L4          0.5772      0.080      7.213      0.000       0.420       0.734
ma.L5          0.9896      0.160      6.166      0.000       0.675       1.304
ma.L6          0.5241      0.109      4.801      0.000       0.310       0.738
sigma2        10.8144      0.184     58.919      0.000      10.455      11.174
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):              3777.07
Prob(Q):                              0.97   Prob(JB):                         0.00
Heteroskedasticity (H):               9.58   Skew:                            -0.77
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.53
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

p-value pre-difference: 0.8497185184511583
p-value post-difference: 3.048210521686816e-26
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(2, 1, 2)   Log Likelihood                2245.279
Date:                Thu, 21 Aug 2025   AIC                          -4478.558
Time:                        05:40:13   BIC                          -4444.920
Sample:                             0   HQIC                         -4466.211
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0032      0.002      1.882      0.060      -0.000       0.006
ar.L1         -1.7502      0.021    -82.449      0.000      -1.792      -1.709
ar.L2         -0.8770      0.020    -43.824      0.000      -0.916      -0.838
ma.L1          1.6731      0.027     62.116      0.000       1.620       1.726
ma.L2          0.7825      0.026     30.546      0.000       0.732       0.833
sigma2         0.0063   8.11e-05     77.116      0.000       0.006       0.006
===================================================================================
Ljung-Box (L1) (Q):                   0.16   Jarque-Bera (JB):              9729.79
Prob(Q):                              0.69   Prob(JB):                         0.00
Heteroskedasticity (H):              16.27   Skew:                            -0.05
Prob(H) (two-sided):                  0.00   Kurtosis:                        13.78
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

LSTM Approach & Analysis

Next, I selected LSTM for its capability of modeling sequential data (in this case, time series stock). LSTM does not use linear modeling; instead, it learns patterns within its observed cell states. In this model, I normalized the data for efficiency and to prevent scaling issues. I used TensorFlow and devised a predictive pattern per 60 days. In simple terms, the LSTM model uses the previous 60 days to predict a single day’s stock price. This is iterated over the entire stock’s data. The sequential model is then constructed with 50 units of internal memory cells and 8 neurons using Rectified Linear Unit (ReLU), a neural network function. The model is trained on a batch size of 32 samples and is passed through the training set 20 times (epoch of 20). I am not (yet) particularly well-versed in neural network machine learning. Many of the variable number selections (20 epochs, 32 samples, 50 units of internal memory cells, etc.) are fundamental values I selected based on conventional practice. Because this LSTM model is sequenced several times, I also included a Model Checkpoint that will keep the best-performing model. The data was also partitioned into 80/20 for training and testing purposes. The models seen here are running very well. As seen visually, the predicted values are very closely aligned with the actual values. I believe that the LSTM models did a much more thorough job based on its iterative function.

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm (LSTM)                     │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 1s 114ms/step

14/16 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_1 (LSTM)                   │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 1s 114ms/step

14/16 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_2 (LSTM)                   │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 1s 109ms/step

14/16 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

Prophet Approach & Analysis

Finally, Prophet is the last model I am implementing. Prophet expects a data frame that consists of two columns: commonly known as “ds” (timestamps, or in other words, date) and “y” (target variable, in my case, daily closing prices). For the most part, my data is already in this format. The only thing I had to implement later was stripping the timezone. Unlike LSTM, this model is linear and is an additive model. Seasonality is a specific feature that this model anticipates. Our stock data is computed daily over around 9 years, so I select the yearly seasonality to be true and the weekly and daily to be false. Another feature of this model is the setting “freq = b”, which removes weekends and holidays. The performance of this model was interesting. Prophet’s predictive power seemed to be only potent for trends it notices. For example, in the graphs shown for SWPPX and SPY, I partitioned the data into 80/20. The location of the partitioned training/test set happens to be where the tail end of a decrease in stock price occurred. The model saw the decreasing trend and continued to predict that it would decrease. However, for MSFT, I changed the set to 90/10, in which some of the training data included the new upward trend past 2023. The MSFT model instead projected more upwards, indicating that the model relied more on short-term trends to predict values. In addition to this, Prophet includes components that can be plotted. I plotted the trend and yearly movements as well.

Discussion & Model Comparison

I computed the R-squared, Mean Squared Error (MSE), Mean Absolute Error(MAE), and the normalized MSE and MAR (NMAE and NMSE, respectively). As provided in the matrix below, these values reflect the statistical significance of the models. LSTM proved to be the most effective model, with very low NMSE, NMAE, MSE, and MAR values across all 3 stocks. It also had an R-squared value of almost 1.00 across the 3 stocks as well. Prophet and ARIMA both struggled to predict stock values in an effective way. They both had higher MSE and MAE values, and the R-squared across all stocks in both ARIMA and Prophet were negative. This implies that ARIMA may have struggled with nonstationarity (may be an oversight on my end) or seasonality, whereas Prophet may have failed due to overfitting of training or failure to capture market volatility.

Stock Model MSE MAE NMSE NMAE
MSFT Prophet 4591.333 65.959 -12.971 11.021 0.158
MSFT ARIMA 13585.598 105.661 -2.431 37.437 0.291
MSFT LSTM 63.961 6.459 0.982 0.018 0.018
SWPPX Prophet 18.707 3.719 -4.200 1.693 0.337
SWPPX ARIMA 8.303 2.365 -1.056 0.742 0.211
SWPPX LSTM 0.033 0.132 0.992 0.008 0.012
SPY Prophet 21337.151 122.298 -4.163 45.381 0.260
SPY ARIMA 10072.653 84.816 -1.219 21.212 0.179
SPY LSTM 63.426 6.268 0.986 0.014 0.013

Conclusion & Limitations

Overall, LSTM was the best-performing model by far. Based on the trend of stock data it was receiving, it was able to accurately map temporal dependencies. LSTM, however, did take the longest to compute. This is something that should be considered. I did not specifically dive into the time used of each model, but LSTM was the longest by far. I think that there are certainly improvements that can be made. For example, with ARIMA or Prophet, perhaps testing month to month data over a single year may produce better results. I could have also used different stock data, and my models were limited to only three. For ARIMA, I specifically could have referenced or used SARIMA or AutoArima to test for the best parameters. I did implement the P, Q, and D test myself, and could have gone amiss there. I only referenced univariate analysis as well. I examined the closing price and nothing else. While this is most likely a very big factor, there could be a myriad of other factors at play as well. Perhaps a multivariate analysis that incorporates more features would yield more interesting results.